Collocation Extraction: Needs, Feeds And Results Of An Extraction System For German
نویسنده
چکیده
This paper provides a specification of requirements for collocation extraction systems, taking as an example the extraction of noun + verb collocations from German texts. A hybrid approach to the extraction of habitual collocations and idioms is presented, aiming at a detailed description of collocations and their morphosyntax for natural language generation systems as well as to support learner lexicography.
منابع مشابه
Term and Collocation Extraction by Means of Complex Linguistic Web Services
We present a web service-based environment for the use of linguistic resources and tools to address issues of terminology and language varieties. We discuss the architecture, corpus representation formats, components and a chainer supporting the combination of tools into task-specific services. Integrated into this environment, single web services also become part of complex scenarios for web s...
متن کاملExperiments on Candidate Data for Collocation Extraction
The paper describes ongoing work on the evaluation of methods for extracting collocation candidates from large text corpora. Our research is based on a German treebank corpus used as gold standard. Results are available for adjective+noun pairs, which proved to be a comparatively easy extraction task. We plan to extend the evaluation to other types of collocations (e.g., PP+verb pairs).
متن کاملTerm Extraction and Mining of Term Relations from Unrestricted Texts in the Financial Domain
In this paper, we present an unsupervised hybrid textmining approach to automatic acquisition of domain relevant terms and their relations. We deploy the TFIDFbased term classification method to acquire domain relevant terms. Further, we apply two strategies in order to learn lexico-syntatic patterns which indicate paradigmatic and domain relevant syntagmatic relations between the extracted ter...
متن کاملTools for Collocation Extraction: Preferences for Active vs. Passive
We present and partially evaluate procedures for the extraction of noun+verb collocation candidates from German text corpora, along with their morphosyntactic preferences, especially for the active vs. passive voice. We start from tokenized, tagged, lemmatized and chunked text, and we use extraction patterns formulated in the CQP corpus query language. We discuss the results of a precision eval...
متن کاملA Hybrid Approach to Extracting and Classifying Verb+Noun Constructions
We present the main findings and preliminary results of an ongoing project aimed at developing a system for collocation extraction based on contextual morpho-syntactic properties. We explored two hybrid extraction methods: the first method applies languageindepedent statistical techniques followed by a linguistic filtering, while the second approach, available only for German, is based on a set...
متن کامل